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Abstract 

In face-to-face conversations people with functional limitations and their interlocutors may encounter 
communication problems if there is no assistance available. A similar problem arises when the dialog partner is a 
computer application or web site rather than a human being and its user interface lacks full accessibility. Whether in 
face-to-face communications or in front of a computer, an instant translation or interpreter service, available 
anytime and anywhere, would help to bridge the communication gap between the user with special needs and his 
communication partner. 

This paper describes principles and possible applications of a network-based "translation on demand" service. 
Hereby it concentrates on services for deaf and hard-of-hearing persons (text-captioning on demand, signing on 
demand) and a service for blind and visually impaired persons (description on demand). 

Introduction 

The human society heavily relies on communication. This applies to all kind of conversations, whether it is a 
personal conversations, a presentation in a business meeting or a human-computer dialog. If a communication 
partner has a functional limitation that prevents him from gaining fiiU access to the provided information there 
should be a translation from one communication mode to another. Thus for deaf or hard-of-hearing persons audio 
content must be translated to text or sign language (and often vice versa). Vision-impaired participants may need a 
description of a picture or any visual object that is essential in a particular context. 

People with fimctional limitations often experience wide communication gaps in everyday's situations. Consider for 
example a hearing-impaired participant in a meeting where information is mostly exchanged auditorially. However, 
a sign language interpreter (or a verbal description for a visual object) is not always available. In the same meeting 
there might be a blind person who has no access to the visual diagram somebody brought in and which is now in 
the center of the discussion if nobody describes the diagram verbally as the debate goes on. Both, the deaf and blind 
participants in this example, are precluded from frilly participating in this meeting because of their fimctional 
limitations. 

A similar problem faced in the emerging "information society" stems from the fact that we are more and more 
dependent on having unrestricted access to online information provided by a public information network. 
Guidelines help in creating accessible web sites and application interfaces for cross-disability access. However, not 
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all web sites and applications may conform to these guidelines for a number of reasons. And there is no "guideline 
solution" for dynamic content, e.g. live images provided by web cams or live audio streams in distributed 
collaborative environments. 

Translation on Demand 

Basically, an "assistant on demand" is an individual who could be called up to assist someone with a disabihty 
anytime they required it, but who would not be around the rest of the time. The concept of "translation on demand" 
provides a network-based translation service available anytime and anywhere and may be human or computer based 
(or both) [Vanderheiden, 1995] [Vanderheiden, in press]. This paper concentrates on three different types of 
translation on demand services provided for different user groups: text-captioning on demand, signing on demand 
and description on demand. 

Text-Captioning on Demand 

Text-captioning on demand instantly translates speech to text for deaf or hard-of-hearing people. 

Today's speech-recognition software achieves reasonable recognition rates only with a restricted vocabulary or with 
speakers for whom the system was trained before. For many everyday-situations where arbitrary people are talking 
together this is not applicable. But speech-recognition software still can help in getting an accurate speech-to-text 
translation if a dedicated speaker repeats everything that has been said by other people. In fact, this technique works 
well even if the dedicated speaker is in remote location and only connected to the other stakeholders via a wide-area 
network. The network can convey acoustic information in one direction and text-based information in the other 
direction. This text is then displayed on a screen, hand-held display or special eye glasses (e.g. the Personal 
Captioner [PCS]). 

Another technique for a text-captioning service is a trained person typing on a stenographic keyboard connected to 
a computer (e.g. the National Captioning Institute [NCI]). This person could be remotely connected to the customer 
in the same manner as described above. 

Signing on Demand 

Signing on demand is a remote service accessed via wide-area network. An audio stream fi*om the point-of-need 
location can be sent to a remote human sign interpreter. On retum, a video stream showing the interpreter's signing 
is sent over the network and posted on a screen or hand-held device for the hearing-impaired person(s). In addition, 
a video stream from the location to the remote interpreter is needed for sign-to-speech translation. For this direction 
the interpreter's speech is transferred to the requesting location by an audio stream. An accompanying video stream 
fi-om the location to the remote service may also help the sign interpreter in getting valuable context information for 
the translation service. 

There are several reasons why sign language translation may be preferable to text translation for certain situations. 
1) Sign language (particularly American Sign Language) can express information that is conveyed by speech but 
not codable in plain text (e.g. emphasis or timing of spoken words). 2) A deaf or hard-of-hearing person may wish 
to actively take part in a live spoken conversation using sign rather than text. Thus it is more natural to recognize 
signing and respond in signs than read text and respond in signs. 3) Moreover not all hearing-impaired people can 
read and understand Enghsh at conversational speeds. As a conclusion, speech-to-sign translation is an appropriate 
means where information exchange mainly relies on spoken language or where hearing-impaired participants lack 
sufficient reading skills. Signing on demand should provide both translation directions, speech-to-sign and sign-to- 
speech. 

With emerging speech and image recognition, machine-translation and avatar (computer-generated human-like 
character) rendering techniques this service might be provided in a fiiUy automatic mode. However it might remain 
remote because of the sheer need of enormous computing power. Thus an audio stream would be transferred to a 
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remote service application which in return would send movement commands for a signing avatar being rendered on 
a screen or hand-held display. In the other direction the video image of a signing person would be analyzed by the 
remote service. Thus the sign language would be translated to synthesized speech and sent back as an audio stream 
to the requesting location. 

Description on Demand 

Description on demand translates visual information into verbal form for blind and visually impaired persons where 
no other verbal description is available. 

This service is provided by a person remotely connected to the requesting person. Therefore a video stream (or 
image) must be provided to the service showing the visual environment or object to be described. The verbal 
description delivered by the service personnel is then sent over the network and brought to the requestor by 
speakers, headphones or earbud. 

Variations of Application 

There is an almost infinite number of possible applications for the translation on demand service. This section will 
only briefly mention some highlights in order to describe the potential of this service. 

Remote Signing for Public Events 

At the SuperComputing 99 conference, which was held at Portland's Oregon Convention Center in November 1999, 
the Trace R&D Center [Trace] demonstrated the feasibility of real-time sign language translation over a high-speed 
internet for a wide audience [Bamicle et al., 2000]. The audio stream of the plenary session was sent to the remote 
Trace Center in Madison, Wisconsin, where human sign interpreters provided an instant translation service for the 
conference. The signing was captured by video and sent back over Internet II to the conference location where it 
was rendered on a large screen in front of the room. 

In this manner a cost-effective high-quality sign language translation service can be provided anytime and 
anywhere via a high-bandwidth global network 

Text-Captioning via Special Eye Glasses 

Special eye glasses with a monitor built-in from Personal Captioning Systems [PCS] provide discreet personal 
captioning. The text captions are provided through wireless transmission and the words seem to "float" about 18 
inches in front of the eye. This system can be used in community and social activity locations where an audio signal 
is easily available (e.g. theatres, movie theatres, conferences venues). 

Moreover, in a hypothetical scenario, a hearing-impaired person could use this system in a more personal manner 
during conversations or meetings with hearing people. The person could hold a pen-like device with a built-in 
microphone to wirelessly feed a remote service with the environmental sound [Vanderheiden, 1995], Any spoken 
word contained in the sound would then be translated into text and instantly shown on his eye glasses. This device 
would provide individuals who are deaf with the ability to carry on face-to-face conversation with anyone else who 
might be talking to them. However this scenario presumes that the hearing-impaired person himself has the 
capability to speak. 

As of September 2000, a new service called "Instant Captioning" was announced by Ultratec. Current, in field 
testing, this will provide a text translation capability using different technology formats [Ultratec]. 

Remote Description via Earbud 

For blind or visually impaired people it would be a great help if they could call for translation on demand anytime 
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and anywhere. In a hypothetical scenario, if a blind individual became disoriented in an unknown environment, a 
small wireless camera, built into his cane, could transfer a 360 degree view of the current environment to a 
description service. A verbal description of the environment would then be sent to the earbud wom by the blind 
person. Or with the same camera the service could deliver a spoken description of a slide presentation in a business 
meeting. 

Remote Signing in a Collaborative Environment 

A collaborative environment is a network of hardware and software systems supporting distributed teams' 
conversation and collaboration. As an example such a network can facilitate a common meeting of different teams 
at different locations sharing an electronic white-board and electronic documents. 

Collaborative environments are an ideal configuration for a remote sign language translation service. With their 
network of (wirelessly) connected devices (video cams, large screens, microphones, speakers, electronic white- 
boards) they offer a complete platform for audio and video capture and rendering. In this scenario the remote 
location of the sign interpreter is just another node in the collaborative environment and audio and video signals are 
passed back and forth to this node. The video image of the sign interpreter can be displayed in a section of the large 
display or, more discreetly, on a personal laptop or hand-held device, wirelessly connected to the underlying 
network. 

This scenario allows a hearing-impaired participant for real-time sign language conversation in both directions. 
Thus the remote sign interpreter translates other participants* speech to sign and the hearing-impaired participant's 
signing to speech. 

In the future a signing avatar as part of the collaborative environment system could automatically translate spoken 
content into sign language on demand. Also a video-based sign language recognition system could transform 
signing from a hearing-impaired participant into audio for hearing participants. 

Built-in Translation on Demand for Computational User Interfaces 

A functionally impaired computer user from time to time stumbles over (partly or totally) inaccessible user 
interfaces and web sites. This does not necessarily mean that the developers have been thoughtless or even ignorant. 
It might also be caused by the mere dynamic nature of the provided content. In this case it would be nice for the 
user to press a dedicated button (or speak a special conmiand) to launch a text translation, sign translation or video 
description application. This application would connect his computer to a remote location where the appropriate 
interpretation service would be provided with or without human assistance. 

Thus a text translation service could capture the audio stream of the requesting user's computer and display its 
content in text form in a separate window. Or the signing service would open a window showing a sign interpreter 
(or a signing avatar) translating the content of the computer's audio signal into signs. Or the description service 
would provide a verbal (spoken) description of inaccessible visual content, e.g. diagrams, images and videos. 

Built into all browsers and into every operating system's graphical user interface this service button could be the 
anchor to a powerful "safety net" for unexpected "accessibility crashs". The Trace R&D Center currently works 
toward such a globally available translation on demand service for the Grid, the next-generation high-speed Internet 
matrix of services. Being part of the Partnership for Advanced Computational Infrastructure [PACI] funded by the 
National Science Foundation (NSF), Trace aims to harness today's and tomorrow's high-tech solutions for a 
globally available translation service on the Grid. 
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